{
 "metadata": {
  "name": "",
  "signature": "sha256:0c181f9f96b3b7fd14b5766e253aae3c37185ea81cfae909427b03d9993961fd"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Visual Categorization with Bags of Keypoints in Shogun"
     ]
    },
    {
     "cell_type": "heading",
     "level": 4,
     "metadata": {},
     "source": [
      "By Abhijeet Kislay (GitHub ID: <a href='https://github.com/kislayabhi'>kislayabhi</a>) as a GSoC'14 project under Kevin Hughes(GitHub ID: <a href='https://github.com/pickle27'>pickle27</a>)"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This notebook is about performing Object Categorization using <a href=\"http://en.wikipedia.org/wiki/Scale-invariant_feature_transform\">SIFT</a> descriptors of keypoints as features, and <a href=\"http://en.wikipedia.org/wiki/Support_vector_machine\">SVM</a>s to predict the category of the object present in the image. Shogun's <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CKMeans.html\">K-Means clustering </a> is employed for generating the <a href=\"http://en.wikipedia.org/wiki/Bag-of-words_model_in_computer_vision\">bag of keypoints</a> and its <a href=\"http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm\">k-nearest neighbours</a> <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CKNN.html\">module</a> is extensively used to construct the feature vectors. "
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Background"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This notebook presents a bag of keypoints approach to visual categorization. A bag of keypoints corresponds to a histogram of the number of occurences of particular image patterns in a given image.The main advantages of the method are its simplicity, its computational efficiency and its invariance to affine transformations, as well as occlusion, lighting and intra-class variations.\n"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Strategy"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "***1. Compute (SIFT) descriptors at keypoints in all the template images and pool all of them together***"
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "SIFT"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "SIFT extracts keypoints and computes its descriptors. It requires the following steps to be done:\n",
      "* **Scale-space Extrema Detection**: Difference of Gaussian (DOG) are used to search for local extrema over scale and space.\n",
      "* **Keypoint Localization**: Once potential keypoints are found, we refine them by eliminating low-contrast keypoints and edge keypoints.\n",
      "* **Orientation Assignment**: Now an orientation is assigned to each keypoint to achieve invariance to image rotation.\n",
      "* **Keypoint Descriptor**: Now a keypoint descriptor is created. A total of 128 elements are available for each keypoint.\n",
      "\n",
      "To get more details about SIFT in OpenCV, do read OpenCV python documentation <a href=\"http://opencv-python-tutroals.readthedocs.org/en/latest/py_tutorials/py_feature2d/py_sift_intro/py_sift_intro.html\">here</a>.\n",
      "\n",
      "OpenCV has a nice API for using SIFT. Let's see what we are looking at:"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "#import Opencv library\n",
      "try:\n",
      "    import cv2\n",
      "except ImportError:\n",
      "    print \"You must have OpenCV installed\"\n",
      "    exit(1)\n",
      "\n",
      "#check the OpenCV version\n",
      "try:\n",
      "    v=cv2.__version__\n",
      "    assert (tuple(map(int,v.split(\".\")))>(2,4,2))\n",
      "except (AssertionError, ValueError):\n",
      "    print \"Install newer version of OpenCV than 2.4.2, i.e from 2.4.3\"\n",
      "    exit(1)\n",
      "    \n",
      "import numpy as np\n",
      "import matplotlib.pyplot as plt\n",
      "%matplotlib inline\n",
      "from modshogun import *\n",
      "\n",
      "# get the list of all jpg images from the path provided\n",
      "import os\n",
      "def get_imlist(path):\n",
      "    return [[os.path.join(path,f) for f in os.listdir(path) if (f.endswith('.jpg') or f.endswith('.png'))]]\n",
      "\n",
      "#Use the following function when reading an image through OpenCV and displaying through plt.\n",
      "def showfig(image, ucmap):\n",
      "    #There is a difference in pixel ordering in OpenCV and Matplotlib.\n",
      "    #OpenCV follows BGR order, while matplotlib follows RGB order.\n",
      "    if len(image.shape)==3 :\n",
      "        b,g,r = cv2.split(image)       # get b,g,r\n",
      "        image = cv2.merge([r,g,b])     # switch it to rgb\n",
      "    imgplot=plt.imshow(image, ucmap)\n",
      "    imgplot.axes.get_xaxis().set_visible(False)\n",
      "    imgplot.axes.get_yaxis().set_visible(False)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We try to construct the vocabulary from a set of template images. It is a set of three general images belonging to the category of car, plane and train.  \n",
      "\n",
      "OpenCV also provides **cv2.drawKeyPoints()** function which draws the small circles on the locations of keypoints. If you pass a flag, **cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS** to it, it will draw a circle with size of keypoint and it will even show its orientation. See below example."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "plt.rcParams['figure.figsize'] = 17, 4\n",
      "filenames=get_imlist('../../../data/SIFT/template/')\n",
      "filenames=np.array(filenames)\n",
      "\n",
      "# for keeping all the descriptors from the template images\n",
      "descriptor_mat=[]\n",
      "\n",
      "# initialise OpenCV's SIFT\n",
      "sift=cv2.SIFT()\n",
      "fig = plt.figure()\n",
      "plt.title('SIFT detected Keypoints')\n",
      "plt.xticks(())\n",
      "plt.yticks(())\n",
      "for image_no in xrange(3):\n",
      "    img=cv2.imread(filenames[0][image_no])\n",
      "    img=cv2.resize(img, (500, 300), interpolation=cv2.INTER_AREA)\n",
      "    gray=cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)\n",
      "    gray=cv2.equalizeHist(gray)\n",
      "    \n",
      "    #detect the SIFT keypoints and the descriptors.\n",
      "    kp, des=sift.detectAndCompute(gray,None)\n",
      "    # store the descriptors.\n",
      "    descriptor_mat.append(des)\n",
      "    # here we draw the keypoints\n",
      "    img=cv2.drawKeypoints(img, kp, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)\n",
      "    fig.add_subplot(1, 3, image_no+1)\n",
      "    showfig(img, None)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "***2. Group similar descriptors into an arbitrary number of clusters***.\n",
      "\n",
      "We take all the descriptors that we got from the three images above and find similarity in between them.\n",
      "Here, similarity is decided by <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CEuclideanDistance.html\">Euclidean distance</a> between the 128-element SIFT descriptors. Similar descriptors are clustered into **k** number of groups. This can be done using Shogun's <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CKNN.html\">**KMeans class**</a>. These clusters are called **bags of keypoints** or **visual words** and they collectively represent the **vocabulary** of the program. Each cluster has a **cluster center**, which can be thought of as the representative descriptor of all the descriptors belonging to that cluster. These cluster centers can be found using the <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CKMeans.html#a5d8a09aeadada018747786a5470d3653\">**get_cluster_centers()**</a> method.\n",
      "\n",
      "To perform clustering into **k** groups, we define the **get_similar_descriptors()** function below."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def get_similar_descriptors(k, descriptor_mat):\n",
      "\n",
      "    descriptor_mat=np.double(np.vstack(descriptor_mat))\n",
      "    descriptor_mat=descriptor_mat.T\n",
      "\n",
      "    #initialize KMeans in Shogun \n",
      "    sg_descriptor_mat_features=RealFeatures(descriptor_mat)\n",
      "\n",
      "    #EuclideanDistance is used for the distance measurement.\n",
      "    distance=EuclideanDistance(sg_descriptor_mat_features, sg_descriptor_mat_features)\n",
      "\n",
      "    #group the descriptors into k clusters.\n",
      "    kmeans=KMeans(k, distance)\n",
      "    kmeans.train()\n",
      "\n",
      "    #get the cluster centers.\n",
      "    cluster_centers=(kmeans.get_cluster_centers())\n",
      "    \n",
      "    return cluster_centers"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "cluster_centers=get_similar_descriptors(100, descriptor_mat)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "***3. Now, compute training data for the SVM classifiers. ***.\n",
      "\n",
      "Since we have already constructed the vocabulary, our next step is to generate viable feature vectors which can be used to represent each training image so that we can use them for multiclass classification later in the code. \n",
      "\n",
      "    \n",
      "   * We begin by computing **SIFT** descriptors for each training image.\n",
      "  \n",
      "   \n",
      "   * For each training image, associate each of its descriptors with one of the clusters in the vocabulary. The simplest way to do this is by using **k-Nearest Neighbour** approach. This can be done using Shogun's  <a href=\"http://shogun-toolbox.org/doc/en/3.0.0/classshogun_1_1CKNN.html\">KNN class</a>. <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CEuclideanDistance.html\">Euclidean distance</a> measure is used here for finding out the neighbours.\n",
      "   \n",
      "   \n",
      "   * Making a histogram from this association. This histogram has as many bins as there are clusters in the vocabulary. Each bin counts how many descriptors in the training image are associated with the cluster corresponding to that bin. Intuitively, this histogram describes the image in the **visual words** of the **vocabulary,** and is called the **bag of visual words descriptor** of the image. \n",
      "   \n",
      "In short, we approximated each training image into a **k** element vector. This can be utilized to train any multiclass classifier.\n",
      "\n",
      "\n",
      "First, let us see a few training images"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# name of all the folders together\n",
      "folders=['cars','planes','trains']\n",
      "training_sample=[]\n",
      "for folder in folders:\n",
      "    #get all the training images from a particular class \n",
      "    filenames=get_imlist('../../../data/SIFT/%s'%folder)\n",
      "    for i in xrange(10):\n",
      "        temp=cv2.imread(filenames[0][i])\n",
      "        training_sample.append(temp)\n",
      "\n",
      "plt.rcParams['figure.figsize']=21,16\n",
      "fig=plt.figure()\n",
      "plt.xticks(())\n",
      "plt.yticks(())\n",
      "plt.title('10 training images for each class')\n",
      "for image_no in xrange(30):\n",
      "    fig.add_subplot(6,5, image_no+1)\n",
      "    showfig(training_sample[image_no], None)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We here define **get_sift_training()** function to get all the **SIFT** descriptors present in all the training images."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def get_sift_training():\n",
      "    \n",
      "    # name of all the folders together\n",
      "    folders=['cars','planes','trains']\n",
      "    \n",
      "    folder_number=-1\n",
      "    des_training=[]\n",
      "      \n",
      "    for folder in folders:\n",
      "        folder_number+=1\n",
      "\n",
      "        #get all the training images from a particular class \n",
      "        filenames=get_imlist('../../../data/SIFT/%s'%folder)\n",
      "        filenames=np.array(filenames)\n",
      "        \n",
      "        des_per_folder=[]\n",
      "        for image_name in filenames[0]:\n",
      "            img=cv2.imread(image_name)\n",
      "\n",
      "            # carry out normal preprocessing routines\n",
      "            gray= cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)\n",
      "            gray=cv2.resize(gray, (500, 300), interpolation=cv2.INTER_AREA)\n",
      "            gray=cv2.equalizeHist(gray)\n",
      "\n",
      "            #get all the SIFT descriptors for an image\n",
      "            _, des=sift.detectAndCompute(gray, None)\n",
      "            des_per_folder.append(des)\n",
      "    \n",
      "        des_training.append(des_per_folder)\n",
      "    return des_training"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "descriptor_training=get_sift_training()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We define the **compute_training_data()** function which returns the training data required for multiclass classification in the later stages.\n",
      "\n",
      "Inputs are:\n",
      "\n",
      "* **k**=number of clusters\n",
      "\n",
      "* **cluster_centers**=descriptors that are approximate form of all descriptors belonging to a particular cluster\n",
      "\n",
      "* **descriptors**=SIFT descriptors of the training images that are obtained from the above function."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def compute_training_data(k, cluster_centers, descriptors):\n",
      "    \n",
      "    # a list to hold histograms of all the training images\n",
      "    all_histograms=[]\n",
      "    # labels for all of the test images\n",
      "    final_labels=[]\n",
      "    # to hold the cluster number a descriptor belong to\n",
      "    cluster_labels=[]\n",
      "\n",
      "    #initialize a KNN in Shogun\n",
      "    dist=EuclideanDistance()\n",
      "    labels=MulticlassLabels(np.double(range(k)))\n",
      "    knn=KNN(1, dist, labels)\n",
      "\n",
      "    #Target descriptors are the cluster_centers that we got earlier. \n",
      "    #All the descriptors of an image are matched against these for \n",
      "    #calculating the histogram.\n",
      "    sg_cluster_centers=RealFeatures(cluster_centers)\n",
      "    knn.train(sg_cluster_centers)\n",
      "\n",
      "    # name of all the folders together\n",
      "    folders=['cars','planes','trains']\n",
      "    folder_number=-1\n",
      "\n",
      "    for folder in folders:\n",
      "        folder_number+=1\n",
      "\n",
      "        #get all the training images from a particular class \n",
      "        filenames=get_imlist('../../../data/SIFT/%s'%folder)\n",
      "\n",
      "        for image_name in xrange(len(filenames[0])):\n",
      "            \n",
      "            des=descriptors[folder_number][image_name]\n",
      "            \n",
      "            #Shogun works in a way in which columns are samples and rows are features.\n",
      "            #Hence we need to transpose the observation matrix\n",
      "            des=(np.double(des)).T\n",
      "            sg_des=RealFeatures(np.array(des))\n",
      "\n",
      "            #find all the labels of cluster_centers that are nearest to the descriptors present in the current image. \n",
      "            cluster_labels=(knn.apply_multiclass(sg_des)).get_labels()\n",
      "\n",
      "            histogram_per_image=[]\n",
      "            for i in xrange(k):\n",
      "                #find the histogram for the current image\n",
      "                histogram_per_image.append(sum(cluster_labels==i))\n",
      "\n",
      "            all_histograms.append(np.array(histogram_per_image))\n",
      "            final_labels.append(folder_number)\n",
      "\n",
      "    # we now have the training features(all_histograms) and labels(final_labels) \n",
      "    all_histograms=np.array(all_histograms)\n",
      "    final_labels=np.array(final_labels)\n",
      "    return all_histograms, final_labels, knn"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "all_histograms, final_labels, knn=compute_training_data(100, cluster_centers, descriptor_training)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We have to solve a multiclass classification problem here. In Shogun these are implemented in: <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CMulticlassMachine.html\">MulticlassMachine</a> \n",
      "\n",
      "***4. We train a one-vs-all SVM for each category of object using the training data***:\n",
      "\n",
      "The following function returns a trained <a href=\"http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CGMNPSVM.html\">GMNPSVM</a>, which is a true Multiclass SVM in Shogun employing **one vs rest** approach.\n",
      "Inputs are:\n",
      "* **all_histograms**=Can be thought as the feature vector for all images for which the SVM has to be trained.\n",
      "* **final_labels**=The labels respective of the above mentioned feature vectors"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def train_svm(all_histograms, final_labels):\n",
      "    \n",
      "    # we will use GMNPSVM class of Shogun for one vs rest multiclass classification\n",
      "    obs_matrix=np.double(all_histograms.T)\n",
      "    sg_features=RealFeatures(obs_matrix)\n",
      "    sg_labels=MulticlassLabels(np.double(final_labels))\n",
      "    kernel=LinearKernel(sg_features, sg_features)\n",
      "    C=1\n",
      "    gsvm=GMNPSVM(C, kernel, sg_labels)\n",
      "    _=gsvm.train(sg_features)\n",
      "    return gsvm"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "gsvm=train_svm(all_histograms, final_labels)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "***5. Now, classify by using the trained SVM***:\n",
      "\n",
      "First let us see all the test images"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# Lets see the testing images\n",
      "testing_sample=[]\n",
      "#get all the testing images  \n",
      "filenames=get_imlist('../../../data/SIFT/test_image/')\n",
      "for i in xrange(len(filenames[0])):\n",
      "    temp=cv2.imread(filenames[0][i])\n",
      "    testing_sample.append(temp)\n",
      "\n",
      "plt.rcParams['figure.figsize']=20,8\n",
      "fig=plt.figure()\n",
      "plt.xticks(())\n",
      "plt.yticks(())\n",
      "plt.title('Test Images')\n",
      "for image_no in xrange(len(filenames[0])):\n",
      "    fig.add_subplot(3,8, image_no+1)\n",
      "    showfig(testing_sample[image_no], None)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We define the function **get_sift_testing()** which returns all the descriptors present in the testing images."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def get_sift_testing():\n",
      "    filenames=get_imlist('../../../data/SIFT/test_image/')\n",
      "    filenames=np.array(filenames)\n",
      "    des_testing=[]\n",
      "    for image_name in filenames[0]:\n",
      "        result=[]\n",
      "        #read the test image\n",
      "        img=cv2.imread(image_name)\n",
      "\n",
      "        #follow the normal preprocessing routines \n",
      "        gray= cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)\n",
      "        gray=cv2.resize(gray, (500, 300), interpolation=cv2.INTER_AREA)\n",
      "        gray=cv2.equalizeHist(gray)\n",
      "\n",
      "        #compute all the descriptors of the test images\n",
      "        _, des=sift.detectAndCompute(gray, None)\n",
      "        des_testing.append(des)\n",
      "    return des_testing"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "descriptor_testing=get_sift_testing()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "In the following **classify_svm()** function, we use the trained **GMNPSVM** for classifying the test images. It returns the predictions from our trained SVM."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def classify_svm(k, knn, des_testing):\n",
      "    \n",
      "    # a list to hold histograms of all the test images\n",
      "    all_histograms=[]\n",
      "    filenames=get_imlist('../../../data/SIFT/test_image/')\n",
      "    \n",
      "    for image_name in xrange(len(filenames[0])):\n",
      "        \n",
      "        result=[]\n",
      "        des=des_testing[image_name]\n",
      "        \n",
      "        #Shogun works in a way in which columns are samples and rows are features.\n",
      "        #Hence we need to transpose the observation matrix\n",
      "        des=(np.double(des)).T\n",
      "        sg_des=RealFeatures(np.array(des))\n",
      "\n",
      "        #cluster all the above found descriptors into the vocabulary\n",
      "        cluster_labels=(knn.apply_multiclass(sg_des)).get_labels()\n",
      "\n",
      "        #get the histogram for the current test image\n",
      "        histogram=[]\n",
      "        for i in xrange(k):\n",
      "            histogram.append(sum(cluster_labels==i))\n",
      "        \n",
      "        all_histograms.append(np.array(histogram))\n",
      "\n",
      "    all_histograms=np.double(np.array(all_histograms))\n",
      "    all_histograms=all_histograms.T\n",
      "    sg_testfeatures=RealFeatures(all_histograms)\n",
      "    return gsvm.apply(sg_testfeatures).get_labels()"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "predicted=classify_svm(100, knn, descriptor_testing)\n",
      "print \"the predicted labels for k=100 are as follows: \"\n",
      "print predicted"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "6.\n",
      "***Selecting the classifier that gives the best overall classification accuracy with respect to number of clusters (k)*** :\n",
      "\n",
      "We define the function **create_conf_matrix()** which creates the confusion matrix. \n",
      "\n",
      "Inputs are:\n",
      "* **expected**=The actual labels which the test images belong to\n",
      "* **predicted**=The output of our SVM\n",
      "* **n_classes**=number of classes (here 3 i.e cars, trains and planes)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def create_conf_matrix(expected, predicted, n_classes):\n",
      "    m = [[0] * n_classes for i in range(n_classes)]\n",
      "    for pred, exp in zip(predicted, expected):\n",
      "        m[exp][int(pred)] += 1\n",
      "    return np.array(m)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Form the **expected** list. \n",
      "\n",
      "* **0** represents **cars**\n",
      "* **1** represents **planes**\n",
      "* **2** represents **trains**"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "import re\n",
      "filenames=get_imlist('../../../data/SIFT/test_image/')\n",
      "# get the formation of the files, later to be used for calculating the confusion matrix\n",
      "formation=([int(''.join(x for x in filename if x.isdigit())) for filename in filenames[0]])\n",
      "    \n",
      "# associate them with the correct labels by making a dictionary\n",
      "keys=range(len(filenames[0]))\n",
      "\n",
      "values=[0,1,0,2,1,0,1,0,0,0,1,2,2,2,2,1,1,1,1,1]\n",
      "label_dict=dict(zip(keys, values))\n",
      "\n",
      "# the following list holds the actual labels\n",
      "expected=[]\n",
      "for i in formation:\n",
      "    expected.append(label_dict[i-1])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We extend all the steps that we did for **k**=100 to few other values of **k** and check their accuracies with respect to the **expected** labels. Alongside, we also draw their respective **confusion matrix**."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "best_k=1\n",
      "max_accuracy=0\n",
      "\n",
      "for k in xrange(1,5):\n",
      "    k=100*k\n",
      "    \n",
      "    # step 2\n",
      "    cluster_centers=get_similar_descriptors(k, descriptor_mat)\n",
      "    \n",
      "    # step 3\n",
      "    all_histograms, final_labels, knn=compute_training_data(k, cluster_centers, descriptor_training)\n",
      "    \n",
      "    # step 4\n",
      "    gsvm=train_svm(all_histograms, final_labels)\n",
      "    \n",
      "    # step 5\n",
      "    predicted=classify_svm(k, knn, descriptor_testing)\n",
      "    accuracy=sum(predicted==expected)*100/float(len(expected))\n",
      "    print \"for a k=%d, accuracy is %d%%\"%(k, accuracy)\n",
      "    \n",
      "    #step 6\n",
      "    m=create_conf_matrix(expected, predicted, 3)\n",
      "\n",
      "    if accuracy>max_accuracy:\n",
      "        best_k=k\n",
      "        max_accuracy=accuracy\n",
      "        best_prediction=predicted\n",
      "    \n",
      "    print \"confusion matrix for k=%d\"%k\n",
      "    print m"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "From all the above k's we choose the one which has the best accuracy. Number of k's can be extended further to enhance the overall accuracy.\n",
      "\n",
      "Test images along with their predicted labels are shown below for the most optimum value of k: "
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "plt.rcParams['figure.figsize']=20,8\n",
      "fig=plt.figure()\n",
      "for image_no in xrange(len(filenames[0])):\n",
      "    fig.add_subplot(3,8, image_no+1)\n",
      "    plt.title('pred. class: '+folders[int(best_prediction[image_no])])\n",
      "    showfig(testing_sample[image_no], None)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Conclusion"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Here we have presented a simple but novel approach to generic visual categorization using feature vectors constructed from clustered descriptors of image patches."
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "References:"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* Visual Categorization with Bags of Keypoints by Gabriella Csurka, Christopher R. Dance, Lixin Fan, Jutta Willamowski, C\u00e9dric Bray\n",
      "\n",
      "* Distinctive Image Features from Scale-Invariant Keypoints by David G. Lowe\n",
      "\n",
      "* Practical OpenCV by Samarth Brahmbhatt, University of Pennsylvania "
     ]
    }
   ],
   "metadata": {}
  }
 ]
}